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Abstract. We survey some work concerned with small universal Turing 
machines, cellular automata, tag systems, and other simple models of 
computation. For example it has been an open question for some time 
as to whether the smallest known universal Turing machines of Minsky, 
Rogozhin, Baiocchi and Kudlek are efflcient (polynomial time) simula- 
tors of Turing machines. These are some of the most intuitively simple 
computational devices and previously the best known simulations were 
exponentially slow. We discuss recent work that shows that these ma- 
chines are indeed efficient simulators. In addition, another related result 
shows that Rule 110, a well-known elementary cellular automaton, is ef- 
ficiently universal. We also discuss some old and new universal program 
size results, including the smallest known universal Turing machines. 
We finish the survey with results on generalised and restricted Turing 
machine models including machines with a periodic background on the 
tape (instead of a blank symbol), multiple tapes, multiple dimensions, 
and machines that never write to their tape. We then discuss some ideas 
for future work. 



1 Introduction 

In this survey we explore results related to the time and program size complexity 
of universal Turing machines, and other models of computation. We also discuss 
results for variants on the Turing machine model to give an idea of the many 
strands of work in the area. Of course the choice of topics is incomplete and 
reflects the authors' interests, and there are other related surveys that may 
interest the reader [32,38,41,57]. 

In 1956 Shannon [95] considered the question of finding the smallest possible 
universal Turing machine [99], where size is the number of states and symbols. 
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In the early Sixties, Minsky and Watanabe had a running competition to see 
who could find the smallest universal Turing machine [51,54,103,104]. Early at- 
tempts [23,104] gave small universal Turing machines that efficiently (in poly- 
nomial time) simulated Turing machines. In 1962, Minsky [54] found a small 
7-state, 4-symbol universal machine. Minsky's machine worked by simulating 
2-tag systems, which were shown to be universal by Cocke and Minsky [8,55]. 
Rogozhin [88] extended Minsky's technique of 2-tag simulation and found small 
machines with a nunibcir of state-symbol pairs. Subsequently, some of Rogozhin's 
machines were reduced in size or improved by Robinson [86,91], Kudlek and Ro- 
gozhin [27], and Baiocchi [4]. All of the smallest known 2-tag simulators are 
plotted as circles in Figure 1. Also, Table 1 lists a number of these machines. 

Unfortunately, Cocke and Minsky's 2-tag simulation of Turing machines was 
exponentially slow. The exponential slowdown was essentially caused by the use 
of a unary encoding of Turing machine tape contents. Therefore, for many years 
it was entirely plausible that there was an exponential trade-off between program 
size complexity on the one hand, and time/space complexity on the other: the 
smallest universal Turing machines seemed to be exponentially slow. 

Figure 1 shows a non-universal curve. This curve is a lower bound that gives 
the state-symbol pairs for which it is known that the halting problem is decidable. 
The 1-symbol case is trivial and Shannon [95] claimed that 1-state Turing ma- 
chines are non-universal. However, both Fischer [12] and Nozaki [70] noted that 
Shannon's definition of universal Turing machine is too strict and so his proof 
is not sufficiently general. Later, the 1-state case was shown by Hermann [19]. 
Pavlotskaya [75] and, via another method, Kudlek [26] have shown that there are 
no universal 2-state, 2-symbol machines, where one transition rule is reserved 
for halting. Pavlotskaya [77] has also shown that there are no universal 3-state, 
2-symbol machines, and also claimed [75], without publishing a proof, that there 
are no universal machines for the 2-state, 3-symbol case. Again, both of these 
cases assume that a transition rule is reserved for halting. 

2 Time and size efficiency of universal machines 

As mentioned above, some of the very earliest small Turing machines were poly- 
nomial time simulators. Subsequently, attention turned to the smaller, but ex- 
ponentially slower, 2-tag simulators given by Minsky, Rogozhin and others. 

Recently [65] we have given small machines that are efficient polynomial time 
simulators. More precisely, if M is a deterministic single-tape Turing machine 
that runs in time t and space s, then there are machines, with state-symbol 
pairs given by the sqiiares in Figure 1, that directly simulate M in polynomial 
time O(t^) and linear space 0{s). These machines define a 0{t^) curve. They 
are currently the smallest known universal Turing machines that simulate Turing 
machines in O(t^) time. Their 0(s) space usage is also extremely efficient, more 
efficient than the other machines in Figure 1, all of which use space that is up 
to square root of their simulation time. 
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universal, direct simulation, O(i^), [65] 

universal, 2-tag simulation, 0(t^\og^t), [4,27,91] 

universal, bi-tag simulation, 0{t^), [69] 

semi-wcakly universal, direct simulation, 0{t'^)., [105] 

scmi-wcakly imiversal, cyclic-tag simulation, 0(i^ log'^ t) 

weakly universal, Rule 110 simulation, 0{t'^\og^t), [68] 
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Fig. 1. State-symbol plot of small universal Turing machines. The type of simu- 
lation is given for each group of machines. Simulation time overheads arc given 
in terms of simulating a single-tape deterministic Turing machine that runs in 
time t. 



Despite the existence of these efficient O(i^) simulators, it still remained the 
case that the smallest universal machines were exponentially slow. However, we 
have recently shown that the smallest machines are in fact efficient simulators 
of Turing machines, by showing that 2-tag systems are efficient [108]. Tag sys- 
tems are one of a number of rewriting systems invented in the 1920s by Post, 
although published somewhat later [79]. Post wanted to prove the decidability 
of various properties of tag systems, but found that even very simple examples 
had extremely complicated behaviour. Forty years later, Minsky showed that 
tag systems [53] are in fact computationally universal, and then Cocke and Min- 
sky [8,55] showed universality for a particularly simple form called 2-tag systems. 
Minsky [54,55] saw that one could find very small universal Turing machines by 
simulating 2-tag systems, and since then 2-tag systems have been at the core of 
many results in the field. 

A 2-tag system acts on a dataword, which is a string of symbols taken from a 
finite alphabet S. There is a fixed set of rules R: S ^ S*.In a single timestep, 
the leftmost symbol aj of the dataword is read, if there is a rule aj aj then 
the string aj is appended to the right of the dataword and the leftmost two 
dataword symbols are deleted. This process is iterated until a suitable halting 
condition is reached (i.e. there is no rule for the read symbol, the dataword has 
length less than 2, or the 2-tag system enters a repeating loop). Part of the 
reason why it was presumed that 2-tag systems were exponentially slow is that 
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it is not obvious how to locate a specific symbol based solely on its position 
relative to other symbols in the dataword (one might want to do this to simulate 
the local action of a Turing machine tape head). The main result of [108] uses 
an algorithm that solves this problem, and does so efficiently. 

More precisely, given a deterministic single-tape Turing machine M that 

runs in time t, there is a 2-tag system that simulates M and runs in polynomial 
time O(t^log^t). The small machines of Minsky, Rogozhin, and others have a 
quadratic time overhead when simulating 2-tag systems, hence by the result 
in [108] they simulate Turing machines in time O(t^log^t). It turns out that 
the time overhead can be improved [63] to 0{t^ log^ t), giving the 0{t^ log^ t) 
time overhead for the machines shown in Figure 1 as hollow circles. Thus, there 
is currently little evidence for the claim of an exponential trade-off between 
program size complexity, and time / space complexity. 

From the point of view of program size, Ncary and Woods [63,69] have re- 
cently given four Turing machines that arc presently the smallest known (stan- 
dard) machines with 2, 3, 4 and 5 symbols. The 5-symbol machine improves on 
the 5-symbol machine of Rogozhin [91] by one transition rule. The remainder of 
these machines improve on the 2- and 4-symbol machines of Baiocchi [4] , and 
the 3-symbol machine of Rogozhin [91]. These small machines simulate Turing 
machines in polynomial time 0{t^) and are illustrated as triangles in Figure 1. 
They were proven universal via simulation of our universal variant of tag sys- 
tems called bi-tag systems [69]. Bi-tag systems are essentially 1-tag systems (and 
so they read and delete one symbol per timestep) augmented with additional 
context sensitive rules that read, and delete, two symbols per timestep. Bi-tag 
systems are a restriction of Post's normal systems [79]. On the one hand bi-tag 
systems are universal, while on the other hand they are sufficiently 'simple' to 
be simulated by such small machines. 

Exponentially improving the time efficiency of 2-tag systems has implications 
for a number of models of computation, besides small universal Turing machines. 
Following our result, the simulation efficiency of many biologically inspired mod- 
els of computation, including neural networks, H systems and P systems, has 
been improved from exponential to polynomial. For example, Siegelmann and 
Margenstern [96] give a neural network that uses only nine high-order neurons 
to simulate 2-tag systems. Taking each synchronous update of the nine neurons 
as a single parallel timestep, their neural network simulates 2-tag systems in 
linear time. They note that "tag systems suffer a significant slow-down ... and 
thus our result proves only Turing universality and should not be interpreted 
complexity- wise as a Turing equivalent." Now we know that their neural net- 
work is in fact efficiently universal. Rogozhin and Verlan [93] give a tissue P 
system with eight rules that simulates 2-tag systems in linear time, and thus 
we have improved its simulation time overhead from exponential to polynomial. 
This system uses splicing rules (from H systems) with membranes (from P sys- 
tems) and is non-deterministic. Harju and Margenstern [18] gave an extended 
H-system with 280 rules that generates recursively enumerable sets using Ro- 
gozhin's 7-state, 4-symbol universal Turing machine. Using our result from 2-tag 
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systems, the time efficiency of their construction is improved from exponential 
to polynomial, with a possible small constant increase in the number of rules. 
The efficiency of Hooper's [22] small 2-tape universal Turing machine is also 
improved from exponential to polynomial, as is Rothemund's [94] restriction 
enzyme implementation of Minsky's 7-state, 4-symbol UTM. The technique of 
simulation via 2-tag systems is at the core of many of the universality proofs in 
Margenstern's survey [41] . Our work exponentially improves the time overheads 
in these simulations, such as Lindgren and Nordahl's cellular automata [31], 
Margenstern's non-erasing Turing machines [34,36] , and Robinson's tiling [85] . 

3 Non-standard universal Turing machines: time 
efficiency and program size 

So far we have been discussing results for universal Turing machines that have 
one tape, one tape head, and are deterministic (we often refer to this setup as 
the standard model). Of course one can consider results for other variants of 
the model. There are many generalised models, for example allowing multiple 
tapes, multiple dimensions, or even coupling the Turing machine with a finite 
automaton. Restricted models include non-printing, non-erasing and reversible 
Turing machines, and machines with restricted instructions. In this section we 
explore program size and time complexity results for a number of generalised 
and restricted models. Table 2 contains program size results for a number of 
such non-standard machines. 

3.1 Weak universality and Rule 110 

An interesting generalisation occurs when we stick to the standard conventions, 
but wc allow the blank portion of the tape to contain a word, that is constant 
(independent of the input), and is repeated infinitely often in one direction, say to 
the left of the input. We say that such Turing machines are semi-weakly universal. 
Some of the earliest small universal Turing machines were semi- weak [104,105]. 
Sometimes another word is also repeated infinitely often to the right. Universal 
machines that use this setup are called weakly universal [43]. 

It is not difficult to sec how this generalisation can help to reduce program 
size. For example, it is typical of small universal Turing machine simulations that 
the program being simulated is stored on the tape. When reading an instruction 
we often mark certain symbols. At a later time wc then restore marked symbols 
to their original values. If the simulated program is repeated infinitely often, say 
to the left of the input, things may be much easier as we can simply skip the 
'restore' phase of our algorithm and access a new copy of the program when 
simulating the next instruction, thus reducing the universal program's size. 

This was the strategy used by Watanabe [104,105] to find the semi-weak, 
direct Turing machine simulators shown as hollow diamonds in Figure 1. Re- 
cently [111] we have given three new semi- weakly universal machines and these 
are shown as solid diamonds in Figure 1. These machines simulate cyclic tag 
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systems [9]. It is interesting to note that two of our machines are symmetric 
with those of Watanabe (around the hne where states = symbols), despite the 
fact that we use a different simulation technique. Our 4-state, 5-symbol machine 
has only 17 transition rules, making it the smallest known semi- weakly uni- 
versal machine (Watanabc's 5-state, 4-symbol machine has 18 transition rules, 
and his 7-state, 3-symbol machine has 21 rules [105])^. The time overhead for 
these machines is polynomial. More precisely, if M is a single-tape deterministic 
Turing machine that runs in time t, then M is simulated by either of our semi- 
weak machines in time 0{t'^ log^ t). Watanabe's semi- weak machines also ran in 
polynomial time, with a very efficient time overhead of 0{t^). 

Cook, Eppstein, and Wolfram [9,107] gave weakly universal Turing machines 
that were significantly smaller than the existing semi-weak machines. These were 
improved upon by Neary and Woods [68] to give the smallest known weakly 
universal machines. In (states, symbols) notation their sizes are (2, 4), (3, 3) and 
(6, 2), and they are illustrated in Figure 1. These machines work by simulating 
Rule 110, a very simple kind of cellular automaton. Rule 110 is an elementary 
cellular automaton, which means that it is a one- dimensional, nearest neighbour, 
binary cellular automaton [106]. More precisely, it is composed of a sequence of 
cells . . .p-ipoPi ■ ■ ■ where each cell has a binary state pi € {0, 1}. At timestep 
t + 1 the value of cell Pi,t+i = -F(pi_i,t,Pj,t)Pi-i-i,t) is given by the synchronous 
local update function F 

i^(0,0,0) = i^(l,0,0) = 

F(0,0,1) = 1 i^(l,0, 1) = 1 

i^(0,l,0) = l i^(l,l,0) = l 

F(0,1,1) = 1 F(1,1,1) = 

Rule 110 was shown to be universal via an impressive and detailed simulation of 
cyclic tag systems, the result is stated and described in [107] and the full proof 
is given in [9]. In the proof, the Rule 110 instance has a special (constant) word 
repeated infinitely to the left of the input, and another to the right. Rule 110 
has a very simple update rule which facilitates the writing of very small weak 
Turing machines to simulate it. 

As noted, Rule 110 was shown to be universal by simulating cyclic tag sys- 
tems, which in turn simulate 2-tag systems. The chain of simulations included 
the exponentially slow 2-tag algorithm of Cocke and Minsky, thus Rule 110, and 
the weakly universal machines that simulate it, were exponentially slow. In a re- 
cent paper [64] we have improved their simulation time overhead to polynomial 
by showing that cyclic tag systems are efficient simulators of Turing machines. In 
doing so, we solved what Cook [10] has called the "geometry problem of cyclic- 
state tape processors." The difficult in overcoming this problem is that there is 
no obvious way for the system to efficiently determine which symbols or objects 



^ Watanabe mentions that he found a (7, 3) universal machine with 21 transition rules 
in reference [105]. We have not found the details of this machine, however the most 
reasonable inference from the literature is that it is semi-weakly universal. 



6 



are adjacent to each other. Previous works used unary encodings as it was not 
obvious how to determine the relative positions of adjacent digits in a sequence. 
Our main result was in providing an efBcient solution to this problem. 

Our result has interesting implications for Rule 110. For example, given an 
initial configuration of Rule 110, and a value t in unary, predicting t timesteps of a 
Rule 110 computation is P-complete. Therefore, unless P = NC, which is widely 
believed to be false, we cannot hope to quickly (in poly logarithmic time) predict 
the evolution of this simple cellular automaton even if we have a polynomial 
amount of parallel hardware. Rule 110 is the simplest (one-dimensional, near- 
est neighbour) cellular automaton that has been shown to have a P-complete 
prediction problem. In particular, Ollinger's [71] intrinsic universality result 
already shows that prediction for one dimensional nearest neighbour cellular 
automata is P-complete for 6 states (later improved to 4 states by Richard 
and OUingcr [84,72]), and our result improves this to 2 states. The question of 
whether Rule 110 prediction is P-complete has been asked, directly or indirectly, 
in a number of previous works (for example [2,58,59]). 

It is currently unknown whether all of the lower bounds in Figure 1 hold for 
weak machines. For example, the non-universality results of Pavlotskaya were 
proven for the case where one transition rule is reserved for halting, however the 
smallest weak machines do not halt. 

3.2 Other non-standard universal Turing machines 

Weakness has not been the only generalisation on the standard model in the 
search for ever smaller universal machines. We give some notable examples here, 
many others arc to be found in Table 2. 

Before Shannon's famous paper, Moore [60] observed that 2-symbol machines 
were universal as any Turing machine could be converted into a 2-symbol ma- 
chine by the (now) usual encoding. In the same paper Moore used this observa- 
tion to give a universal 3-tape machine with 15 states and 2 symbols. Moore's 
machine uses only 57 instructions, each instruction being a sextuple that either 
moves one of its tape heads or prints a single symbol to one of its tapes. One 
of the tapes in Moore's 3-tape machine is circular and contains the simulated 
program, therefore his machine also operates correctly if the circular tape is re- 
placed with a one-way infinite tape with a periodic background (i.e. semi- weak). 
Moore's result has been largely ignored in the literature despite being the first 
published small universal Turing machine. Interestingly, Moore's paper cites un- 
published work by Shannon on the universality of non-erasing machines. 

Hooper [21,22] gave universal machines with 2 states, 3 symbols and 2 tapes, 
and with 1 state, 2 symbols and 4 tapes. One of the tapes in Hooper's 4-tape 
machine is circular and contains the simulated program, and so could be re- 
placed by a one-way infinite tape with a periodic background (i.e. semi- weak). 
Priese [82] gave a 2-state, 4-symbol machine with a 2-dimensional tape, and a 
2-state, 2-symbol machine with 2 tape heads and a 2-dimensional tape. Mar- 
genstern and Pavlotskaya [46,45] gave a 2-state, 3-symbol Turing machine that 
uses only 5 instructions and is universal when coupled with a finite automaton. 
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They also showed that the halting problem is decidable for such machines with 
4 instructions [46]. 



3.3 Restricted universal Turing machines 



If we suitably restrict the standard Turing machine model the problem of find- 
ing universal machines with small state-symbol products becomes more difficult. 
Over the years, a number of authors have looked at non-erasing Turing machines, 
that is machines that are permitted to overwrite blank symbols only. Moore [60] 
mentions that Shannon had proved that such non-erasing Turing machines sim- 
ulate arbitrary Turing machines, however Shannon's work was never published. 
Shortly after. Shannon published a proof that 2-symbol Turing machines are uni- 
versal, and Wang [101] proved that 2-symbol non-erasing Turing machines are 
universal. Later, Minsky proved the same result as Wang, but using the tech- 
nique of simulation via non-writing Turing machines, yet another (universal) 
restriction [53]. 

Margenstern has examined the universality of 2-symbol Turing machines for 
a number of different restrictions. One such restriction is the number of colours of 
a machine, defined as the number of distinct triples {a, D, S), where a is the read 
symbol, D is the move direction, and S is the write symbol of a transition rule. 
Pavlotskaya [75,76] has shown that there are standard universal Turing machines 
with 3 colours and no standard universal Turing machines with 2 colours. Mar- 
genstern [34] has shown that there are non-erasing universal Turing machines 
with 5 colours and no non-erasing universal Turing machines with 4 colours. 
Laterality number is another property examined by Margenstern. The laterality 
number of a Turing machine is defined as the minimum of the number of left 
move instructions and the number of right move instructions. Margenstern and 
Pavlotskaya [75,44] have shown that there are universal Turing machines with 
laterality number 2 and no universal machines with laterality number 1. Mar- 
genstern [36,39] has shown that there are universal non-erasing Turing machines 
with laterality number 3 and no universal non-erasing machines with laterality 
number 2. For more on these results see [33,34,35,36,37,42]. 

Fischer [12] gives a number of universality results for Turing machines that 
use restricted forms of transition rules. In one result he proves that 3-state Post 
machines are universal (Post machines [80] are like Turing machines, except 
that in a single timestep they can move or write, but not both). Interestingly, 
Aanderaa and Fischer [1] show that the halting problem for 2-state Post machines 
is decidable. 

Bennett [5] has shown that 3-tapc reversible Turing machines are universal. 
Morita and others have since shown universality results for reversible Turing 
machines with 1 tape and 2 symbols [61], and 17 states and 5 symbols [62]. 
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3.4 Universal Turing machines with multidimensional tapes: time 
efficiency and program size 

During the 1970s a number of authors [82,25,100] were interested in finding 
small universal Turing machines with multidimensional tapes. The machines of 
these authors have not, to our knowledge, been analysed from the perspective 
of time/space complexity. We discuss this topic here. 

Lutz Priese [82] gives a 2D machine with 2 states and 4 symbols that is uni- 
versal on finite initial conditions (i.e. all except a finite number of symbols are 
initially blank), and another 2D machine with 2 states, 2 symbols and 2 tape 
heads that is derived from this 4-symbol machine. Pricsc's machines simulate 
counter machines (also called register machines), via a sequence of reductions. 
Given a counter machine that runs in time r, Priese's machines simulate its 
computation in time O(t^) and space 0{t). Due to the unary encoding used by 
counter machines [13], both of Priese's machines simulate Turing machines with 
an exponential time overhead. Priese's machines do not end their computation 
in the conventional manner of halting on a state-symbol pair that has no tran- 
sition rule: instead there is a choice, via the initial input encoding, of ending a 
computation either by entering a sequence of 6 repeating configurations or by 
halting when an attempt is made to move off the edge of the 2D tape. 

Langton's ant [29] is usually described as an ant that lives on a 2D grid of 
binary- valued cells. The ant chooses which adjacent cell to move to based on 
(a) the current cell's binary value and (b) the ant's current orientation. The ant 
flips the current cell's bit as it moves away. So Langton's ant is a 2D Turing 
machine with 2 symbols and 4 states (North, South, East and West). Gajardo et 
al. [15] showed that predicting the behaviour of the ant is P-hard, by simulating 
Boolean circuits in polynomial time. By then showing how the ant can simulate 
an infinite sized circuit (with a simple repeating structure), which in turn can 
simulate the space-time diagram of a cellular automata (CA), they prove that 
Langton's ant is weakly universal in 2D. 

It is worth pausing to describe a form of weak universality in 2D, where 
the tape has a background that is ultimately periodic in both dimensions of 
single quadrant. A one-way infinite sequence is ultimately periodic [14] if it is 
of the form S1S2 where S2 = s^s^s^ . . ., and si and S2 are finite sequences. We 
say that a N x N pattern is ultimately periodic in the x direction if for each 
y G N the infinite sequence of symbols at the coordinates (0, y), (1, y), (2, y), . . . 
is ultimately periodic. This is defined analogously for the y direction. 

Kleinc-Biining and Ottmann [25] give universal Turing machines which have 
a single multidimensional tape, a number of which are weakly universal. Re- 
markably, their 2D, 2-state, 3-symbol machine does not even print to the tape! 
The two counter values of a simulated 2-counter machine are encoded by the 
{x,y) position of the tape head on the 2D tape. Testing for zero amounts to 
detecting one of the axes. It is well-known that 2-counter machines are uni- 
versal [56]. However, using known algorithms, 2-countcr machines suffer from a 
doubly-exponential slowdown when simulating Turing machines [97] , and so the 
2-state, 3-symbol machine of Kleine-Biining and Ottmann also suffers from a 
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doubly-exponential slowdown when simulating Turing machines. We give a brief 
overview of this machine's computation. 

The 2D tape uses only the upper-right quadrant of the plane and so each tape 
cell may be indexed by a coordinate of the form {x, y) G N x N. The quadrant 
is filled using 4, infinitely repeated, finite square blocks (of tape symbols) which 
we will call A, B, C, and D. The infinite pattern on the 2D tape given by the 
arrangement of these blocks is ultimately periodic in both the x and y directions. 
Each block is of size 0{r^) where r is the number of instructions in the 2-counter 
machine being simulated. The block at the origin of the quadrant is of type A. 
Types B and C are repeated along along the x-axis and j/-axis, respectively, and 
the remainder of the quadrant is tiled by blocks of type D. Each block encodes the 
entire program of the 2-counter machine being simulated. The ciirrent counter 
machine instruction being simulated is given by the position of the tape head 
within a block. If the counters have values xi and yi respectively, then the tape 
head will be in the xf^ block from the y-axis and the yf^ block from the x- 
axis. The blocks contain specially defined paths that the tape head follows to 
(a) arrive at the next counter machine instruction and (b) move to one of the 
adjacent blocks if a change in the value of a counter is being simulated. A, B and 
C blocks lie along the axes and so are used to simulate any instruction where one 
or more counters have value zero, and in particular they contain special paths 
that simulate a positive test for zero. 

Kleine-Biining and Ottmann adapt their technique to give a non-printing 

1- state, 7-symbol universal machine with a 3D tape. Only 2 planes in the third 
dimension are used, giving tape cells that are indexed by coordinates {x,y,z), 
where x,y G N and z G {0, 1}. The pattern defined by the symbols on each of 
the infinite 2D planes given by {x,y,0) and {x,y,l) is ultimately periodic in 
both the X and y directions. The technique used to simulate 2-counter machines 
by the 1-state, 7-symbol machine is, in essence, the same as the technique use 
by the 2-state, 3-symbol machine. The 2D machine uses 2 states to remember 
which path it is following when two different paths cross (the tape head follows 
paths that encode instructions of the counter machine being simulated). With 
the introduction of a third dimension it is no longer necessary for paths to cross 
and so it is possible to give a universal Turing machine with only 1 state. Finally, 
we note that an immediate corollary of this machine's design is the existence of 
a non-halting universal machine with only 6 symbols, as the only purpose of one 
of the 7 symbols is to provide an undefined transition rule for halting. 

It is a fairly straightforward matter to show that for each Turing machine 
with a single, ultimately periodic, 2D tape and no print instructions there is a 

2- counter machine that simulates it in linear time. It immediately follows that 
improvement on the doubly-exponential time overhead when simulating Turing 
machines with such non-printing 2D machines is not possible unless such an 
improvement is also possible for 2-counter machines. Thus, it could be interesting 
to see if the simulation time overhead for such machines can be reduced to singly- 
exponential when a slightly more complicated background is permitted on the 
tape. 
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Wagner [100] shows that the halting problem for Turing machines with a 
single fcD tape (A: G N), 2 symbols and 2 states is decidable"^. Specifically, he 
shows that if such machines halt then they do so in space 0(n), where n is input 
length. It is not difficult to give relevant decidability results (such as predicating 
looping or halting) for machines with a single ID tape and non-printing instruc- 
tions, even when an ultimately periodic background is permitted. Regarding A;D 
machines, it can be shown for some classes of these machines that only weaker 
forms of universality are possible. For the case of fcD non-printing machines, it 
is not difficult to give relevant decidability results when the initial tape contains 
only a finite number of non-blank symbols. Herman has shown that the halt- 
ing problem is decidable for 1-state fcD printing machines when all but a finite 
number of tape cells are blank at the start of each computation [20] . 

Though lacking in formal rigour, a comparison between the three 2D ma- 
chines we discussed in this section poses some interesting questions about the 
possible trade-offs for different 2D models. For example, out of the three ma- 
chines the 2-state, 3-symbol weak machine has the smallest state-symbol prod- 
uct, is the only non- writing machine, and the only machine that can halt. The 
2-state, 4-symbol machine of Priese is the only machine of the three that does not 
use a periodic (weak) encoding, and the 4-state, 2-symbol machine of Gajardo 
et al. (Langton's ant) is the only machine of the three that simulates Turing 
machines in polynomial time. The best we can hope for with non-printing 2D 
machines is a singly exponential time overhead, but achieving even this bound 
would seem to be very tricky. It is interesting to note that the only non-weak 
2D machine of the three, that of Priese, has an exponential time overhead when 
simulating Tming machines. This is not the case for the smallest non-weak ID 
machines. It begs the question, is there a non-weak 2D machine with the same 
number of states and symbols as Priese's machine that is universal with a poly- 
nomial time overhead? 

3.5 Termination of a computation 

As we hope has been made clear so far, it is vitally important to clearly specify 
the computational model one is using when trying to find small universal pro- 
grams or give lower bounds on universal program size. In the absence of a clear 
model description and matching lower bounds, one can never claim to have found 
the "smallest" universal program. Throughout this work we have described re- 
sults on upper bounds and lower bounds on universal program size and we have 
described how both change when the model definition changes. In this section 
we focus on one such issue: computation termination. 

A number of authors have given universal Turing machines where successful 
computations do not end in a halt state. Many of the machines given in Table 2 
are non-halting. What about the problem of proving relevant non-universality 

Machines using Wagner's definition end their computation with a simple loop: re- 
peatedly executing a special transition rule that does not change the configuration. 
This is equivalent to executing a halting transition rule. 
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results for these models? Such non-universality results are not achievable by 
proving the halting problem decidable. Before we attempt such an endeavour 
we must agree on a clear definition of universal Turing machine. For example, 
instead of specifying the end of a computation by a single halting (or terminal) 
configuration, a computation could end with a specific sequence of configura- 
tions. We refer to this as a terminal configuration sequence. The output of the 
simulated Turing machine is retrieved by applying a recursive decoding function 
to the entire computation (also a configuration sequence). There are many ways 
to define terminal configuration sequence, some examples are: 

— a configuration sequence that goes through a specified sequence of states, 

— a configuration sequence that contains two identical configurations, 

— a configuration that contains a specific subword. 

Given a definition of a terminal configuration sequence we may prove that the 
terminal sequence problem (will a machine execute a terminal configuration 
sequence) is decidable. This gives non-universal lower bounds that are relevant to 
universal machines that end their computation with such a sequence. However, 
this result may not hold as a proof of non-universality if we subsequently alter 
our definition of terminal configuration sequence. One more general approach is 
to prove that the terminal sequence problem for all possible terminal sequences, 
of a machine or set of machines, is decidable. In any case, it is important to 
specify these details when giving upper and lower bounds on program size. 

4 Busy beavers 

Besides small universal Turing machines, one finds small, yet complicated, pro- 
grams in the busy beaver literature. The term busy beaver was introduced by 
Rado [83] who put forward a game where the goal for a given A; G N is to find, 
out of all the fc-state, 2-symbol Turing machines, the machine that prints the 
most Is and then halts when started on a blank tape. The busy beaver function 
U : N — >■ N is then defined by letting S{k) be the maximum number of I's printed 
by any halting fc-state, 2-symbol Turing machine. Busy beavers essentially ad- 
here to the standard Turing machine model described in previous sections (one 
tape, one head, usual blank symbol, deterministic). It is known that = 1 

(trivial), E{2) = 4 [83], E{3) = 6 [30], and i:(4) = 13 [6]. However for 5 states 
or more the best we currently have are lower bounds. For example, Michel [50] 
cites r(5) > 4098 to Marxen and Buntrock [47], and S{6) > 3.5 x IQisaey 
Pavel Kropitz. S{k), the step-counting analogue of S{k), is also considered. In 
fact, both S and S grow faster than any computable function [83]. Green [16] 
has given a lower bound on the growth of the function E. 

The busy beaver problem has been generalised to machines with i>2 sym- 
bols [7], where S{k, (.) is the largest number of non-zeros written by any fc-state, 
^-symbol Turing machine. It has been shown [7,28] that i7(2,3) = 9. Terry 
Ligocki and Shawn Ligocki have shown that X'(2,4) > 2,050 and X'(3, 3) > 
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374,676,383, and have given lower bounds on a number of other state-symbol 
pairs. See Michel's survey [50] for more results. 

Although finding busy beavers is somewhat orthogonal to the goal of finding 
small universal Turing machines, there are potential connections between the two 
fields. On the one hand, when designing small universal programs one often has 
to reuse instructions in many different contexts, something which busy beavers 
might also do, so perhaps small instruction sets from one field might be useful 
for the other. On the other hand, proving lower bounds on universal program 
size, and upper bounds on values for the busy beaver function, both involve hefty 
case analyses so once again techniques developed in one field could potentially 
be useful for the other. In particular, the search for busy beavers has produced 
small programs with very complicated behaviour, which lend weight to the idea 
that proving non- universality of such program classes might be difficult. 

5 Further work 

There are many avenues for further work, here we highlight a few examples. 

Applying computational complexity theory to the area of small universal 
Turing machines allows us to ask a number of questions that arc more subtle 
than the usual questions about program size. As we move towards the origin in 
Figure 1, the universal machines have larger (but polynomial) time overheads. 
Can the time overheads in Figure 1 be further improved (lowered)? Can wc prove 
lower bounds on the simulation time of machines with a given state-symbol pair? 
Proving non-trivial simulation time lower bounds seems like a difficult problem. 
Such results could be used to prove that there is a polynomial trade-off between 
simulation time and universal program size. 

As we move away from the origin, the non-universal machines seem to have 
more power. For example Kudlek's classification of 2-state, 2-symbol machines 
shows that the sets accepted by these machines are regular, with the exception 
of one context free language (a" 6"). Can we hope to fully characterise the sets 
accepted by non-universal machines (e.g. in terms of complexity or automata 
theoretic classes) with given state-symbol pairs or other program restrictions? 

When discussing the complexity of small machines the issue of encodings 
becomes very important. For example, when proving that the prediction problem 
for a small machine is P-complete [17], the relevant encodings should be in 
logspace, and this is the case for all of the polynomial time machines in Figure 1. 

Of course there are many models of computation that wc have not mentioned 
where researchers have focused on finding small universal programs. Post's [79] 
tag systems are an interesting example. Minsky [52,53] showed that tag systems 
are universal with deletion number 6. Cocke and Minsky lowered the deletion 
number to 2, by showing that 2-tag systems were universal. They used produc- 
tions (appendants) of length at most 4. Wang [102] further lowered the produc- 
tion length to 3. Recently, De Mol [11] has given a lower bound by showing that 
the reachability (and thus halting) problems are decidable for 2-tag systems 
with 2 symbols; a problem which Post claimed [81] to have solved but never 
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published. It would be interesting to find the smallest universal tag systems in 
terms of number of symbols, deletion length, and production length. 

The space between the non-universal curve and the smallest non-weakly uni- 
versal machines in Figure 1 contains some complicated beasts. These lend weight 
to the feeling that finding new lower bounds on universal program size is tricky. 
Most noteworthy are the weakly and semi-weakly universal machines discussed 
earlier. Table 2 highlights that the existence of general models that provably 
have less states and symbols than the standard universal machines can have (for 
example the machines with (states, symbols, dimensions, tapes) of (2,3,2,1) [25], 
(1,7,3,1) [25], and (1,2,1,4) [22]). Also of importance are the busy beavers [50] and 
small machines of Margenstern [40,41], Baiocchi [3], and Michel [48,49] that live 
in this region and simulate iterations of the 3a:;-|-l problem and other CoUatz-like 
functions. So it seems that there are plenty of animals yet to be tamed. 
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statss 


symbols 


sta>t6-symbol product 


author 


771 


9 
Z 


z?7l 


oiiannon [yoj 


9 


71 


ZTl 


oiiaiinon [yoj 


1 9 





79 
/ Z 


laKanasiii [yoj (^nieiiirioiieQ in [iu4jj 


1 n 
iU 
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OU 


Ikcno [23] (also appears in [51]) 


Q 
O 


c 

o 


4o 


Watanabe [103] (mentioned in [54]) 


1 
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o 


yl9 

4I:Z 


iVimsKy [D±j 


Q 
O 


c 
D 


/in 


watanaDe [J^U4j 


Q 

y 




OD 


Tritter (mentioned in [54]) 


ZD 


9 
z 


OU 


iViiuoKy 1001 
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oo 


ivimsKy 1041 


7 


"4: 
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Zo 


iVillloKy |04| 




9 
z 


/IS 


ixogoznm [o / ,oo,yij 


2 


21 


42 


Rop-ozhiri \H7 881 


11 


3 


33 


Rogozhin [87,88] 


3 


10 


30 


Rogozhin [87,88] 


7 


4 


28 


Rogozhin [87,88,91] 


5 


5 


25 


Rogozhin [87,88,91] 


4 


6 


24 


Rogozhin [87,88,91] 


2 


18 


36 


Rogozhin [91] 


10 


3 


30 


Rogozhin [89,91] 


3 


10 


30 


Rogozhin [90,91]* 


22 


2 


44 


Rogozhin [92] 


19 


2 


38 


Baiocchi [4] 


7 


4 


28 


Baiocchi [4]* 


3 


9 


27 


Kudlek & Rogozhin [27] 


18 


2 


36 


Neary & Woods [66] 


9 


3 


27 


Ncary & Woods [69] 


5 


5 


25 


Neary & Woods [69]* 


6 


4 


24 


Neary & Woods [69] 


15 


2 


30 


Neary & Woods [69] 



Table 1. Small standard universal Turing machines, ordered by date and then by 
state-symbol product. If there are multiple machines with the same state-symbol 
pair, the machine with the smallest number of instructions is denoted *. 
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states 


symbols 


dimensions 


tape 


author 


15 


2 


1 


3 


Moore [60] f 


6 


5 


1 


1 


Watanabe [104] f 


1 


2 


1 


4 


Hooper [21,22]t 


2 


3 


1 


2 


Hooper [21,22] 


7 


3 


1 


1 


Watanabe (mentioned in [105,70])t 


5 


4 


1 


1 


Watanabe [105] f 


8 


4 


2 


1 


Wagner [100] 


2 


7 


2 


1 


Ottmann [73] t 


10 


2 


2 


1 


Ottmann [74,25] t 


6 


3 


2 


1 


Ottmann [74,25] j: 


4 


4 


2 


1 


Ottmann [74,25] J 


2 


6 


2 


1 


Kleine-Biining & Ottmann [25]]: 


2 


5 


2 


1 


Klcinc-Biining & Ottmann [25]]: 


2 


3 


2 


1 


Kleine-Biining & Ottmann [25]]; 


1 


7 


3 


1 


Kleine-Biining & Ottmann [25]]: 


4 


5 


2 


1 


Kleine-Biining & Ottmann [25] 


3 


6 


2 


1 


Kleine-Biining & Ottmann [25] 


10 


2 


2 


1 


Kleine-Biining [24] 


2 


5 


2 


1 


Kleine-Biining [24] 


2 


4 


2 


1 


Priese [82] 


4 


2 


2 




Gajardo et al. [15] 


2 


2 


2 




Priese [82] A 


2 


5 


1 




Margenstern & Pavlotskaya [45]* 


4 


7 


1 




Pavlotskaya [78]* 


2 


3 


1 




Margenstern & Pavlotskaya [46]* 


7 


2 


1 




Eppstcin (published by Cook [9])| 


4 


3 


1 




Cook [9] & Wolfram [107] J 


3 


4 


1 




Cook [9] & Wolfram [107]]: 


2 


5 


1 




Cook [9] & Wolfram [107] f 


6 


2 


1 




Neary & Woods [68] f 


3 


3 


1 




Neary & Woods [68]]: 


2 


4 


1 




Neary & Woods [68] t 


3 


7 


1 




Woods & Neary [lll]t 


4 


5 


1 




Woods & Neary [lll]t 


2 


13 


1 




Woods & Neary [llljt 



Table 2. Small non-standard universal Turing machines. Semi-weak machines 
are denoted by f, weak machines by J, machines coupled with a finite automaton 
by *, and a machine with 2 tape heads by A. 
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